Limiting Result Cardinalities for Multidatabase Queries Using Histograms

نویسندگان

  • Kai-Uwe Sattler
  • Oliver Dunemann
  • Ingolf Geist
  • Gunter Saake
  • Stefan Conrad
چکیده

Integrating, cleaning and analyzing data from heterogeneous sources is often complicated by the large amounts of data and its physical distribution which can result in poor query response time. One approach to speed up the processing is to reduce the cardinality of results – either by querying only the first tuples or by obtaining a sample for further processing. In this paper we address the processing of such queries in a multidatabase environment. We discuss implementations of the query operators, strategies for their placement in a query plan and particularly the usage of histograms for estimating attribute value distributions and result cardinalities in order to parameterize the operators.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query Consolidation: Interpreting a Set of Independent Queries Using a Multidatabase Architecture in the Reverse Direction

We introduce the problem of query consolidation, which seeks to interpret a set of disparate queries submitted to independent databases with a single “global” query. This problem has multiple applications, from improving database design to protecting information from a seemingly innocuous set of apparently unrelated queries. The problem exhibits attractive duality with the much-researched probl...

متن کامل

Query Decomposition, Optimization and Processing in Multidatabase Systems

One way of achieving interoperability among heterogeneous, federated DBMSs is through a multidatabase system that supports a single common data model and a single global query language on top of different types of existing systems. The global schema of a multidatabase system is the result of a schema integration of the schemas exported from the underlying databases, i.e., local databases. A glo...

متن کامل

System P: Completeness-driven Query Answering in Peer Data Management Systems

Peer data management systems (PDMS) are a highly dynamic, decentralized infrastructure for large-scale data integration. They consist of a dynamic set of autonomous peers inter-connected with a network of schema mappings. Queries submitted at a peer are answered with local data and by data that is reached along paths of mappings. Due to redundancies in the mapping network, query answering in PD...

متن کامل

A Bayesian Approach to Estimating the Selectivity of Conjunctive Predicates

Cost-based optimizers in relational databases make use of data statistics to estimate intermediate result cardinalities. Those cardinalities are needed to estimate access plan costs in order to choose the cheapest plan for executing a query. Since statistics are usually collected on single attributes only, the optimizer can not directly estimate result cardinalities of conjunctive predicates ov...

متن کامل

Consistent Histograms In The Presence of Distinct Value Counts

Self-tuning histograms have been proposed in the past as an attempt to leverage feedback from query execution. However, the focus thus far has been on histograms that only store cardinalities. In this paper, we study consistent histogram construction from query feedback that also takes distinct value counts into account. We first show how the entropy maximization (EM) principle can be leveraged...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001